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Abstract. Suppose we have n keys, n access probabilities for the keys, 
and 71+ 1 access probabilities for the gaps between the keys. Let hmin(n) 
be the minimal height of a binary search tree for n keys. We consider the 
problem to construct an optimal binary search tree with near minimal 
height, i.e. with height h < h m i n (n) + A for some fixed A. It is shown, 
that for any fixed A optimal binary search trees with near minimal height 
can be constructed in time 0(n 2 ). This is as fast as in the unrestricted 
case. 

So far, the best known algorithms for the construction of height-restricted 
optimal binary search trees have running time 0(Ln 2 ), whereby L is the 
maximal permitted height. Compared to these algorithms our algorithm 
is at least faster by a factor of log 2 n, because L is lower bounded by 
log 2 n. 



1 Introduction 

Suppose we have n keys, n access probabilities for the keys, and n + 1 access 
probabilities for the gaps between the keys. The problem to construct a binary 
search tree for these n keys that minimizes the expected access time is known 
as the optimal binary search tree problem. Knuth presented in [B] a well-known 
dynamic programming algorithm that solves this problem in 0{n 2 ) time. 

Apart from the original problem, the construction of optimal binary search 
trees whose heights are restricted has been considered in the literature. By the 
height restriction the maximum number of comparisons during a search can be 
bounded. Thus, an optimal height restricted binary search tree performs well in 
both the worst and the average case. Itai [5] and Wessner [ID] independently dis- 
covered construction algorithms for height restricted binary search trees. Their 
algorithms have running time 0(Ln 2 ), where L is the maximal permitted height. 

Let h m - m (n) = [log 2 (n + 1)] be the minimal height of a binary search tree for 
n keys. In this paper, we show that for any fixed A an optimal binary search tree 
with height h < h m - m (n)+A can be constructed in time 0(n 2 ). This improves the 
results from Itai and Wessner [3[TU]. Because L > |~log 2 (n + 1)], the algorithms 
of Itai and Wessner have running time 0(n 2 logn) if we use them to construct 
optimal search trees with height h < h min (n) + A. 



Gagie [3J|3] presents a 0(n) time algorithm for the restructuring of optimal 
binary search trees. His algorithm restructures an existing optimal binary search 
in such a way that the resulting tree has nearly optimal height and cost. In 
contrast to Gagie's algorithm our algorithm always selects the best binary search 
tree from the set of all trees with restricted height. 

Other interesting facts about optimal binary search trees can be found in the 
article of Nagaraj |7 j . This article gives a comprehensive survey about optimal 
binary search trees. 

All algorithms for the construction of optimal binary search trees, whether 
height restricted or not, are based on dynamic programming. They all use step 
by step construction of larger trees from smaller subtrees. Instead of step by step 
construction from smaller subtrees we use a decision model where the keys are 
placed by a sequential decision process in such a way into the tree, that the costs 
become optimal. This approach is adopted from the construction algorithm for 
optimal B-trees pQ. 

The rest of the paper is structured in the following way: in Section 2 a formal 
description of the problem is given. In Section 3 we present our approach: the 
decision model is explained and the attached dynamic program is formulated. 
Section 4 states the solution algorithm and gives the complexity results. Section 
5 summarizes the results. 

2 The Problem 

Now we give the problem formulation. We have n keys fci < • • • < k n and 
2n + 1 probabilities ao, /3i , ax , /?2, • • • , fin f otn- 

fii are the key weights and ay are the gap weights. Pi is the probability that 
key ki is requested, and ctj is the probability, that a search is made for a key 
d with kj < d < fej+i. We assume that we have artificial keys fco = — 00 arL d 

k n +l = 00. 

Let hi be the level resp. the depth of the i-th internal node where key ki is 
stored, and let Oj be the level of the external node for the gap between kj and 
fcj+i. The root is on level 0. For a binary search tree T we define the weighted 
path length wpl(T) by 

n n 

wpl(T) :=5^A(6i + l)+X;«iOj 
»=l j=o 

The weighted path length is the expected number of node visits resp. comparisons 
in a search. 

The height h(T) of a tree T is defined as the level of the deepest external 
node. The minimal height h m i n (n) of a binary search tree for n keys is then given 
by 

hmin(n) = riog 2 (n + 1)] 

We want to construct search trees whose heights are nearly minimal. Let 
A > be some fixed value. The problem is to find a binary search tree T that 



minimizes the weighted path length wpl(T) subject to the constraint h(T) < 
hminin) + A. Such a tree is denoted as an optimal binary search tree with near 
minimal height. 



3 Dynamic Programming Model 

We model the process of constructing an optimal binary search tree with near 
minimal height as a decision problem with n stages. For every key ki we have 
to decide, on which level this key should be placed. Whether placing on some 
level is feasible, depends on the former decisions for the keys k\ to which 
define a certain state in the decision process. Then placing the key ki on any 
level results in an increasing weighted path length and a new state. The amount 
of increasing as well as the new state depend on our decision. 

Using this approach, the optimal tree is the result of a sequence of optimal 
decisions starting in a unique initial state. This leads to a dynamic program DP 
of the form DP = (S v , A v , T„, c v , C„+i), where n is the number of the stages 
of DP, S u is the state set of stage is, 1 < v < n+ 1, and A v is the decision set of 
stage v, 1 < v < n. The sets D v CS„x A v define the feasible decisions for the 
states of stage v. It holds: (s,a) e D v , if and only if a is feasible in state s on 
stage v. The set D v (s) := {a G A v \{s,a) G D u } contains all feasible decisions for 
state s on stage v. T v : D v — > S u +i is the transition function. Making decision a 
in state s at stage v results in state T v (s, a) at stage v + 1. c v : D v — > H is the 
cost function of stage v. c v {s, a) gives the costs that arise if we decide to make 
decision a in state s on stage v. C n +i '■ S n +i — > H is the terminal cost function. 
C n +i(s) gives the costs that arise if our final state is s. 

Now we have to define the components of the dynamic program in such a 
way that the decision process models the construction of a binary search tree 
with restricted height. First we give the definition of the states. For motivation 
take a look at Figure [1] Suppose we have /i m ax := h m i n (n) + A = 3, that means 
we can place the keys on levels from to 2. 

For a correct placing of a key in the partial tree only the rightmost path 
fragments from the actual root to the node that contains the largest key is 
relevant. Due to this fact we can represent a state s £ S v by a binary vector 
with /i max components. We number the vector components from to /i max — 1- 
Vector component Si is related to level i. 



Each vector component S{ determines, whether the level i in the rightmost path 
is occupied. More formally, vector component Si is 1 if and only if the largest 
key on level i is greater than any key on the levels from to i — 1. For instance 






«1) 




(10 1) (0 1 1) 

Fig. 1. Tree states in the construction process 



the state s resulting from tree (a) in Figure Q] is represented by 




and the state s' resulting from tree (c) by 




Observe, that different trees may have the same associated states. For instance 
the trees (a) and (b) of Figure Q] are both represented by the same state. 

The set S u is defined to be the set of all vectors that are possible after the 
assignment oiv — 1 keys. The initial state set Si consists of a single state: 



Si 



A decision is characterized by the level on which a key is placed. So we define 
A = A v = {0, . . . , hmax — 1}- Making decision a means that the corresponding 
key is placed on level a. For instance, the tree (a) in Figure Q] is constructed by 
the decision sequence DS = (1, 0, 2, 1). 




Let s = (so, ■ • ■ , Shmax-i) be a state. A feasible decision a for state s has to 
fulfill the following conditions: 

(i) We can place keys only on unoccupied levels: 

s a = 

(ii) If a key is placed above some path fragment, this path fragment has to be 
the deepest path fragment and the key has to be placed directly above this 
path fragment: 

■ a < i < j and = and Sj = 1 

Condition (i) is obvious. Figure [2] demonstrates condition (ii). The next key 
has to be placed on level 2, because k$ becomes the left son of k±. If we place 
on level 1, the left son would not be on the next deeper level. 
So we can define 

D v := {{s,a)\s £ S Vl a fulfills (i) and (ii)} 

Observe that the feasible decisions of a state s are independent of the stage v. 
So we define 

D(s) := {a e A\a fulfills (i) to (ii)} 

as the set of feasible decisions for state s. For every binary search tree (with near 
minimal height) there exists a unique feasible decision sequence that constructs 



the tree. As an example see the decision sequence to construct tree (a) of Figure 1 
(see above). Using this definition each feasible decision sequence leads to trees 
that are valid binary search trees with the exception of the rightmost path. 
Trees with invalid rightmost path on stage n + 1 are filtered by the terminal cost 
function C„+i (see below). 





(10 1) 
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Fig. 3. Example for a transition 



Making a decision a has two effects. First, the level a of the rightmost path 
becomes occupied and second, the levels from a + 1 to h max — 1 become unoc- 
cupied. So the definition for the transition function is: 

/ s \ 



Sa-l 

T(s, a) := T„(s, a) = 1 





V o J 

Figure [3] shows an example for a single transition. The following state and de- 
cision sequence shows the transitions from the initial state to the right tree of 
Figure [3] 

s) ^ (!) =* Q =* (?) * (i 



If we have a state s s S v , we can deduce from s the preceding decision, i.e. 
the decision on stage v — 1 that induced s. Take a look at the transition function 
T(s,a): the largest i with s, = 1 defines this preceding decision. 



precdec(s) 





max{0 < i < h n 



lis, 



if s = • • • 
1} otherwise 







Our cost function c v (s, a) has to consider two aspects: the level of key k v and 
the level of the gap (fc„_i, k v ). The first is simple: the level of key k v is determined 
by the decision a. With the following Lemma, we are able the determine the level 
of the gap (k u -x,k u ). 

Lemma 1. Let klevel(fc„) denote the level of key k v and let glevel(fc„_i, k v ) 
denote the level of the gap (k u ^i,k u ). Then we have 

g\eve\(kv^i 7 k u ) = 1 + maxlklevel^-i), k\eve\(k„)} 

Proof. Adjacent keys cannot be on the same level. So we have either klevel(fc„-i) < 
klevel(fcy) or klevel(fc^_i) > klevel(fc,,). 

In the case of klevel(fc„-i) < klevel(fc„), the key fc„ is in the right subtree of 
key k v _\ and the gap (fc„_i, k v ) is the left son of the node that contains fc„. In 
the other case the key /c„-i is in the left subtree of key k v and the gap {k v -\, k v ) 
is the right son of the node that contains k v -\. In both cases the equation of 
Lemma [T] is valid. 

The cost functions c v (s, a) are defined by: 

Cj,(s, a) := (1 + max{precdec(s), a}) ■ a u -% + (a + 1) • fi v 

This definition utilizes Lemma [T] klevel(fcj,_i) is equivalent to precdec(s) and 
kleve^fcj,) to the decision a. 

The terminal costs C„+i model whether our final state fulfills the tree con- 
ditions. In particular, we have to check whether the right most path contains 
unoccupied levels above occupied levels. For instance, tree (c) of Figure Q] is not 
a valid search tree because level 1 is not occupied but level 2 is. We have: 

q , , _ f (1 + precdec(s)) • a n if sq = 1 and JM < j : s,; = A Sj = 1 
" + 1 oo otherwise 

To check whether there exists an unoccupied level we use an adaption of condi- 
tion (ii) of the feasible decision set D(s). If the root level is occupied and there 
exists no unoccupied level above an occupied level the terminal costs consist of 
the access probability a n of the last gap multiplied by the level of key k n plus 1. 

Now the definition of the dynamic program DP is complete. Using this def- 
inition the optimization problem is 



F := ^2cy{s Vl a v ) + C n+ i(s n+1 ) -> min 

subject to: 

si = (0---0) 

a v G D(s u ), 1 < v < n 
s u+ i = T(s u , a v ), 1 < v < n 

The value F of the objective function yields the minimum weighted path length 
and the tree is given by the optimal sequence (ax, . . . , a n ) of feasible decisions. 



4 Algorithm and Complexity 



For the solution of this optimization problem we use a common dynamic pro- 
gramming algorithm, cf. [8]. 

Algorithm 1. 

(0) /* Initialization */ 

(1) forall s e S n+ i 

(2) KO) <- C n+1 (s) 

(3) /* Backward Computation */ 

(4) for v n downto 1 do 

(5) forall s £ S v do 

(6) V u {s) <- oo 

f7) 7T^(s) <~ undefined 

forall a £ D(s) do 

fSj if c v (s, a) + Vu+i(T(s, a)) < V v (s) then 

(Jo) K-(«)-«-c(s,o) + Vt+i(T(s,o)) 

('iij ir v (s) <- a 

(12) /* Forward Computation */ 

(13) s 4- (0 • • • 0) 
(WF^Vtis) 

(15) for v <— 1 to n do 
fifij <- 7T w (s) 

^(s) is the waZwe function which represents the minimal costs to reach a 
terminal state from state s on stage v. In line (1) and (2) we initialize the 
value function with the terminal costs. 7r„(s) represents the optimal decision for 
state s on stage v. The value function V v (s) and the optimal decision 7r„(s) is 
determined by the Bellman equation 

V v (s) — min {cJs, a) + V v +i(T(s, a))} 

a£D(s) 

which is solved for all states on all stages in lines (4) to (11). 

After the backward computation terminates, the 7r„ define an optimal policy. 
To get the optimal decision sequence we apply the 7r„ in a forward computation 
(line (13) to (17)) beginning with our initial state. As a result the a v represent 
the decision sequence to build an optimal tree and the value of F is the weighted 
path length of the optimal tree. 

With the decision sequence DS = (ax, . . . ,a n ) that defines the optimal binary 
search tree we are able to build the corresponding tree in linear time, as for each 
key k v the level where fc„ has to be placed is given by the decision a v . 

Example 1. Suppose we have keys ki,...,k& with access probabilities j3\ = 
jg, P2 = jki03 = \, Pa = \ and olq = • • • = = 0. Let A — 0, that means we 
have to construct a tree of height |~log 2 (5)] = 3. 



Fig. 4. State space for the example problem 




Fig. 5. Optimal binary search tree for the example problem 



Figure |4] shows the search graph for this problem. The number adjacent to 
an arc represents the cost c„(s, a) of the corresponding transition. The terminal 
costs C?s(s) are shown below the states of state set S5 and the value function 
V v (s) is shown right beside the states for the state sets Si to S4. Observe, that 
the the value function of state (1, 11) € S4 yields 00 because of an empty decision 
set. 

The best decision sequence DS = (1,2,0,1) is given by the bold arcs. Its 
overall cost is ||, that means the corresponding optimal binary search tree has 
a weighted path length of ||. Figure [5] shows the corresponding tree. 

Our complexity results are based on bounds for the cardinality of the state 
sets S v and the decision sets D v . 

Theorem 1. For all state sets S u (y — 1, . . . , n + 1) we have: 

\S l ,\<2 A+1 (n + l) 

Proof. Let /i max (n) := h min (n) + A and S := {0, With these definitions 
we get 

< 151 = 2' imax(n) = 2 h " lin{n)+A 
Using h min (n) = \\og 2 (n + 1)] we get 

|5| < 2 riog2(™+i)i+^ 

< 2 A+1 ■ 2 log 2( n+1 ) 
= 2 A+1 ■ (n + 1) 



Corollary 1. For any fixed A the cardinality of the state sets S u is bounded by 
0(n). 

Theorem 2. For all feasible decision sets D v (y = 1, . . . , n + 1) we have: 

\D v \<2 A+2 (n + l) 

Proof. Let VaxW := h min (n) + A, S := {0, l}*w(n) and D ~ {( s ,a)\s e 
S,a is feasible for s}. With these definitions we get |£>„| < \D\ for all v = 
l,...,n. 

How many feasible decisions exists for a state s £ S? Take a look at condition 
(ii) in the definition of D(s) (see Section 3). If s/i max -i = 1 there is at most one 
feasible decision a, which is determined by the highest index a with s a — 0. That 
means, that half of all the states in S have only one feasible decision. States with 
s ?imax-i = and s^ max _2 = 1, which comprise a quarter of all states in S, have 
at most two decisions. Generalized, ^151 states of all the states in S have k 
feasible decisions. We get: 

\Du\ < \D\ 

<l.I|5| + 2. 

OO . 

fe=0 
\k=0 

= ((T _ ij^ _ 

= 2-|5| 
< 2 A+2 {n + l) 

Corollary 2. For any fixed A Algorithm]]] constructs an optimal binary search 
with height h < h m i n (n) + A in time 0(n 2 ). 

Proof. We have to iterate over the n stages from n down to 1. In doing so, the 
cardinality of each state set S v and each feasible decision set D„ is bounded by 
0(n) for fixed A. All operations can be executed in constant time. It follows, 
that the overall running time is 0(n 2 ). 




5 Summary 

We have presented a quadratic time algorithm to compute optimal binary search 
trees with near minimal height, i.e. with height h < h m i n (n) + A and fixed A. The 
algorithm was adopted from the construction algorithm for optimal B-tress. The 
construction process was modeled by a decision oriented dynamic program: In 



the model we have to decide key by key, on which level the key should be placed. 
The tree conditions are represented by additional constraints and a terminal cost 
function. 

It seems to be easy to apply this approach to other kinds of trees. By apply- 
ing the construction algorithm of pQ, it should be possible to construct optimal 
B-trees with near minimal height and fixed order in quadratic time, too. The 

2 I log 2 

construction of unrestricted optimal B-trees needs time 0(n + i°gfc+i). A gener- 
alization of the binary tree model to multiway trees of a fixed order should also 
lead to a quadratic time algorithm in constrast to the cubic time algorithms 
for the unrestricted case [H This means for both cases, that optimal trees 
with near minimal height can be constructed faster than unrestricted trees. If we 
consider that optimal trees have typically a low height, the approach of height re- 
striction may lead to fast construction algorithms, which generate optimal trees 
with high probability. 
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